LANGUAGE MODELS FOR TOPIC TRACKING The importance of score normalization
نویسندگان
چکیده
Generative unigram language models have proven to be a simple though effective model for information retrieval tasks. In contrast to ad-hoc retrieval, topic tracking requires that matching scores are comparable across topics. Several ranking functions based on generative language models: straight likelihood, likelihood ratio, normalized likelihood ratio, and the related Kullback-Leibler divergence are evaluated in two orientations. Best performance is achieved by the models based on a normalized log-likelihood ratio. Key component of these models is the a-priori probability of a story with respect to a common reference dis-
منابع مشابه
The importance of score normalization
Generative unigram language models have proven to be a simple though effective model for information retrieval tasks. In contrast to ad-hoc retrieval, topic tracking requires that matching scores are comparable across topics. Several ranking functions based on generative language models: straight likelihood, likelihood ratio, normalized likelihood ratio, and the related Kullback-Leibler diverge...
متن کاملTdt-2004: Adaptive Topic Tracking at Maryland
A topic tracking system that combines elements from vector space and language modeling frameworks to compute document scores is described. The model is used for both the traditional TDT topic tracking evaluation design and the new supervised adaptive topic tracking evaluation. Results indicate that supervised adaptation and score normalization should be more closely coupled, and that current te...
متن کاملModel Selection Based on Tracking Interval Under Unified Hybrid Censored Samples
The aim of statistical modeling is to identify the model that most closely approximates the underlying process. Akaike information criterion (AIC) is commonly used for model selection but the precise value of AIC has no direct interpretation. In this paper we use a normalization of a difference of Akaike criteria in comparing between the two rival models under unified hybrid cens...
متن کاملTracking Interval for Type II Hybrid Censoring Scheme
The purpose of this paper is to obtain the tracking interval for difference of expected Kullback-Leibler risks of two models under Type II hybrid censoring scheme. This interval helps us to evaluate proposed models in comparison with each other. We drive a statistic which tracks the difference of expected Kullback–Leibler risks between maximum likelihood estimators of the distribution in two diff...
متن کاملتشخیص دستنوشتۀ برخط فارسی با استفاده از مدل زبانی و کاهش قوانین نگارش کاربر
The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003